38 research outputs found

    Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient

    Full text link
    We consider a set up where a file of size M is stored in n distributed storage nodes, using an (n,k) minimum storage regenerating (MSR) code, i.e., a maximum distance separable (MDS) code that also allows efficient exact-repair of any failed node. The problem of interest in this paper is to minimize the repair bandwidth B for exact regeneration of a single failed node, i.e., the minimum data to be downloaded by a new node to replace the failed node by its exact replica. Previous work has shown that a bandwidth of B=[M(n-1)]/[k(n-k)] is necessary and sufficient for functional (not exact) regeneration. It has also been shown that if k < = max(n/2, 3), then there is no extra cost of exact regeneration over functional regeneration. The practically relevant setting of low-redundancy, i.e., k/n>1/2 remains open for k>3 and it has been shown that there is an extra bandwidth cost for exact repair over functional repair in this case. In this work, we adopt into the distributed storage context an asymptotically optimal interference alignment scheme previously proposed by Cadambe and Jafar for large wireless interference networks. With this scheme we solve the problem of repair bandwidth minimization for (n,k) exact-MSR codes for all (n,k) values including the previously open case of k > \max(n/2,3). Our main result is that, for any (n,k), and sufficiently large file sizes, there is no extra cost of exact regeneration over functional regeneration in terms of the repair bandwidth per bit of regenerated data. More precisely, we show that in the limit as M approaches infinity, the ratio B/M = (n-1)/(k(n-k))$

    CausalEC: A Causally Consistent Data Storage Algorithm based on Cross-Object Erasure Coding

    Full text link
    Causally consistent distributed storage systems have received significant recent attention due to the potential for providing a low latency data access as compared with linearizability. Current causally consistent data stores use partial or full replication to ensure data access to clients over a distributed setting. In this paper, we develop, for the first time, an erasure coding based algorithm called CausalEC that ensures causal consistency for a collection of read-write objects stored in a distributed set of nodes over an asynchronous message passing system. CausalEC can use an arbitrary linear erasure code for data storage, and ensures liveness and storage properties prescribed by the erasure code. CausalEC retains a key benefit of previously designed replication-based algorithms - every write operation is local, that is, a server performs only local actions before returning to a client that issued a write operation. For servers that store certain objects in an uncoded manner, read operations to those objects also return locally. In general, a read operation to an object can be returned by a server on contacting a small subset of other servers so long as the underlying erasure code allows for the object to be decoded from that subset. As a byproduct, we develop EventualEC, a new eventually consistent erasure coding based data storage algorithm. A novel technical aspect of CausalEC is the use of cross-object erasure coding, where nodes encode values across multiple objects, unlike previous consistent erasure coding based solutions. CausalEC navigates the technical challenges of cross-object erasure coding, in particular, pertaining to re-encoding the objects when writes update the values and ensuring that reads are served in the transient state where the system transitions to storing the codeword symbols corresponding to the new object versions.Comment: Revised to include additional acknowledgement
    corecore